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Abstract 

The detection of influential observations for the standard least squares regression 
model is a question that has been extensively studied. LAD regression diagnostics 
offers alternative approaches whose main feature is the robustness. In this paper a 
new approach for nonparametric detection of influencial observations in LAD regres- 
sion models is presented and compared with other classical methods of diagnostics. 

Key words: Least Absolute Deviations Regression, Robustness, Outliers, Leverage 
Points. 



1 Introduction 



The robustness of LAD to low-leverage outliers, and its susceptibility to high- 
leverage outliers has been extensively studied in literature [2,3,4]. In this paper 
we propose a method for nonparametic detection of such influencial observa- 
tions by the use of a technique derived from LAD regression. Robust methods 
based on the Li-norm have been proposed for example in [5,7]. The approach 
presented here considers suitable perturbations of a given data set and allows 
a detection os high- leverage observations and outliers from a new viewpoint. 
These methods answer to natural requirements for robustness, and give a new 
tool for the analysis of data. 

Let S C M. p+1 be a finite discrete set of points. In statistics, such a set 
may represent observations in p + 1 variables. Denote the elements of S as 
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(xa, . . . ,Xi P ,yi), where the last variable is explained from the preceding ones 
by a linear regression model: 

p 

Vi = Po + Pi x ij + e i for i = 1, . . . , n, 
3=1 

where p is the number of explanatory variables, £j are error terms, or devi- 
ations, and n is the number of observations. The LAD regression model is 
determined by minimizing the sum of the absolute deviations, i.e., the vec- 
tor (/3 , Pi, ... , f3 p ) e W +1 is determined by minimizing on f3 , fii, . . . , /3 p the 
function 



F(A),/3i,-..,&) :=]T 



i=i 



When a linear LAD regression model is fitted, the hyperplane always passes 
through at least p+1 points [1], although the solution may be non-unique. 

For our purposes, we assume that for every dataset and for each subset of 
it we deal with, the hyperplane which fits the linear LAD regression model 
is unique, as well as the observation with maximal absolute deviation. These 
assumptions are reasonable for datasets whose size is sufficiently large and/or 
the data contain sufficient significant digits. We suppose also that the dataset 
is such that every p + 2 points are not in the same hyperplane. With these 
assumptions the linear LAD regression model is unique and it passes through 
exactly p+1 points. Furthermore, if n > p + 1, there is always a point which 
does not belong to the regression hyperplane, so having a positive absolute 
deviation. 

Consider the n datasets composed by all possible subsets of S of size n — 1. 
Under the above assumptions for each dataset we have a unique solution. For 
each case, we assign the score 1 to each point through which the fitted model 
passes and to the other points. We define the final score of each point as the 
sum of scores over all models fitting the n datasets. This score is produced by 
the repeated use of the same points, each time considering a different subset 
of the original data set, so in a certain sense by bootstrapping the linear LAD 
regression model. 

The point (xki,Xk2, ■ ■ ■ , %k P , Uk) will be also denoted by k and its score will be 
denoted by L(k). 

Similarly, we may define another complementary score function, denoted by 
O(k), in the following way. Consider again the n datasets composed by all 
possible subsets of S of size n — 1. For each subset we consider the LAD 
regression line and we give the score 1 to the unique (according to the above 
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assumptions) point which maximizes the absolute distance from the LAD 
regression line. 

We define the score 0(k) as the sum (over all n possible subsets of S of size 
n — 1) of scores arising from the LAD regression lines. 

In Section 2 we discuss some elementary properties of the LAD regression 
model. These properties will justify our algorithms for the detection of out- 
liers and leverage points presented in Section 3. In Section 4 we discuss some 
examples, and compare the results with those obtained using other classical 
methods. 



2 Preliminary Considerations 

Under the above assumptions, the sum of L scores of all points is n(p + l), and 
the sum of O scores is n, so, under the random variable viewpoint, E(L(k)) = 
p + 1 and E(0(k)) = 1. Now suppose that we have a set of observations, all 
concentrated in a region and an isolated observation horizontally very far from 
the others but such that the line of the LAD regression model will pass through 
it (a typical leverage point). It is likely that the L score of this observation 
will be quite high. So a large L score is synonymous of leverage point. On 
the other hand, suppose we have a dataset in which all points are roughly 
in a hyperplane, and a further point far above them (an outlier). The LAD 
regression model will be very near to this hyperplane, and the score L of the 
outlier will be probably zero, but the score O will be probably n — 1. 

To justify these arguments we state the following theorems. 

Theorem 1 Let (xn, . . . , x± p , yi), . . . , (x n i, . . . , x np , y n ) be n points in W +1 . 
Let (x n +u, . . . , x n+ i p , y n +i) be an additional point, such that (x n +u, . . . , x n+ \ p ) 
belongs to the interior of the convex hull determined by the set {(in, . . . , x\ p ), 
. . ., (x n i, x np )}. Ifx n+U , x n+lp are fixed and \y n+1 \ is sufficiently large, 
a hyperplane relative to a linear LAD regression model does not passes through 

( x n+lli • • • 5 %n+lp, Un+l) ■ 

Proof. Suppose p — 1 and that for i — 1, . . . n, c < yi < d. The convex hull 
hypothesis reduces to 

a = min {xi} < x n+ i < max {xi} = b. 

i=l,...,n i=l,...,n 

Let y = (3q + (3\x be the line of the linear LAD regression model. Let l\ 
be the horizontal line y = (c + d)/2. The sum of the absolute deviations of 
(in, • • • , xi p , yi),..., (x n+11 , x n+lp , y n+ i) does not exceed (d - c)(n + 1) + 
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|y n+ i| +\d\ + \c\. So 

F(/3 ,/3i) < (d-c)(n + l) + |d| + |c| + |j/ n+ i|. 

On the other hand, if a line y = (3q + (3{x passes through (x n+ i, y n +i), if it is 
a linear LAD regression model, it will pass also through another point (xi,yi) 
for a suitable i, so for sufficiently large > \y n+ i\/(b — a). Hence 

there exists a > 1 such that for sufficiently large |y n +i|, 

F(/3 *,/3r)>a| J/n+ i|. 

For sufficiently large \y n +i\ we have 

>a|y n+1 | > (d-c)(n + l) + |d| + |c| + |j/ n+ i| >F(/3 ,A), 

and therefore, the line relative to the the LAD regression model cannot pass 
through (x n+1 ,y n+1 ). 

For p > 1 the proof is similar. □ 

Theorem 2 Let (xu, . . . , x± p , yi), . . . , (x n i, . . . , x np , y n ) be n points in W +1 . 
Let (x n+ u, . . . , x n+ i p , y n +i) be an additional point. Ify n+ i is fixed and Y%=i \ x n+i,i 
is sufficiently large, a linear LAD regression model will pass through (x n+ u, 
. . ., x n _|_ip, y n +i) ■ 

Proof. The proof is an exercice, and the approach is similar to the proof of 
Theorem 1. □ 

Theorem 3 Let (xu, . . . , x± p , yi), . . . , (x n i, . . . , x np , y n ) be n points in W +1 , 
with n > p + 1. Let L{k) and 0{k), for k — 1, . . . , n, defined as in Section 1. 
Then L{k) + 0(k) <n-l. 

Proof. This is a consequence of the fact that for each of the n subsets the 
scores are shared among distinct points, and, by the above assumptions, a 
point cannot collect a score for both L and O, since, for L it must have zero 
residual, and for O a strictly positive absolute residual. And each point appears 
exactly n — 1 times in the n subsets. □ 



3 The Algorithms 

In this section we propose two algorithms based on the previous section. The 
aim of Algorithm 1 and Algorithm 2 is to detect leverage points and outliers 
respectively. 
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Algorithm 1. 

(1) Consider a data set S of size n. Let A and B be empty sets. 

(2) Let m be the size of S. 

(3) Consider all m subsets of S of size m — 1 and fit the LAD -regression 
model for each subset. 

(4) Compute L(k) for each point of S and select h G S which maximizes 
L(k). 

(5) If L(h) > |(m — 1) and L(h) > \(n — 1) then move h into B and move 
the eventual points of A into S; otherwise move ki into A. 

(6) If the size of S does not exceed j^n then the process stops andB represents 
the leverage points; otherwise go to step 2. 

In this algorithm, the elements of S are transferred in a set B of leverage 
points, or in a temporary set A where points that did not reached a sufficient 
score to be classified as leverage points, are suitable to be reconsidered after 
that another point has been detected. This trick avoids the masking effect. 

Discriminating values |(m — 1), |(n — 1) and -^n for the score function L, 
have been empirically determined, by testing on several data sets and several 
combinations of values. They have, however, a natural interpretation. When 
there is a unique leverage point, almost all m regression models detect it, so 
its L score is near to the maximum. When there are more leverage points, 
scores may be very different, and the masking effect can produce relatively 
small scores. Finally, we keep into account the size of S, to determine how 
many leverage points a data set may have. The process stops when the size of 
set S does not exceed j^n, so with our method a data set cannot have more 
than j^n leverage points. 

Algorithm 2. 

(1) Consider a data set S of size n. Let C and D be empty sets. 

(2) Initialize the last maximum score (LMS) by 0. 

(3) Let m be the size of S. 

(4) Consider all m subsets of S of size m — 1 and fit the LAD -regression 
model for each subset. 

(5) Compute 0(k) for each point of S and select ki which maximizes 0{k). 

(6) If 0(h) =m-l 

a) Then if 0(h) = LMS - 1 or LMS = then move k x into D, put 
LMS = 0(h) and move the eventual points of C into S; otherwise the 
process stops and D represents the outliers. 

b) Otherwise move h into C. 

(7) If the size of S does not exceed |n then the process stops and D represents 
the outliers; otherwise go to step 3. 
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In this algorithm, the set D contains the points classified as outliers and the 
set C contains the points that did not reached the score to be classified as 
outliers, and that are suitable to be reconsidered in further steps until the 
algorithm stops. 

We can note the two proposed algorithms have a similar structure. However, 
the main difference is a feature of Algorithm 2: the outliers have decreasing 
scores Oi, 0± — 1, 0± — 2, and so on. The algorithm stops when this sequence 
cannot be continued. 

The process also stops when the size of set S does not exceed |n, so here a 
data set cannot have more than \n outliers. 

Discriminating values for the score function O, have been also empirically 
determined. 



4 Some Examples 

In this section we illustrate the proposed algorithms and compare them with 
other two methods using several real and simulated data sets. 

One of the method is the P-R plot proposed by Hadi [5] to aid in classifying 
observations as leverage points, outliers or combinations of both. Some authors 
suggested that points with ha > 2 ^~^ , where ha is the ith diagonal element 
of matrix H, p is the number of predictors and n the number of observations, 
can be classified as leverage points and the points with a J^_ h .. > 2, where rj 
is the residual of the ith observation, ha is the ith diagonal element of matrix 
H and a is an estimator of standard deviation of the errors, can be classified 
as outliers. In what follows the use of these suggested cut-off points to classify 
the observations will be intended as classical methods. 

The first data set 'Telephone' relate the number of international telephone 
calls from Belgium (in tens of millions in minutes) to the variable year for 
24 years and can be found in [7]. Cases 15-20 are unusually high and they 
are outliers. The second one 'Hawkins' consists of 75 observations in four 
dimensions, one response variable and three predictor variables, and can be 
found in [6]. It has been constructed for the study of special pathological 
phenomena in detection of outliers and leverage points and the cases 1-10 are 
outliers and leverage points. The data set 'Scottish' describes how the record 
times (in seconds) of 35 Scottish Hill races is related to two predictor variables, 
distance of race (in miles) and climb (in feet), and can be found in [5]. The 
data contain two clear outliers (observation 7 and 18). The last two data sets 
have been created by the authors. The data set 'twovariables' consists of 56 



6 



observations on one predictor variable and a response variable. The predictor 
was created as uniform (0, 10) and the response variable to be consistent with 
the model Y = X 1 + 4 + e with e ~ N(0, 1). Three observations (51-53) have 
been conceived as leverage points and three others (54-56) as outliers. The 
other data set 'threevariables' is the three variable equivalent to the preceding 
one (two predictor variables). 

Computation has been performed with a computer code in Splus, and the 
results are summarized in Table 1. 



Data 


Method 


Leverages 


Outliers 


Telephone 


Classical Method 


- 


20 




Hadi's Method 


- 


19, 20 




Our Results 


- 


17-20 


Hawkins 


Classical Method 


12-14 


7, 11-14 




Hadi's Method 


14 


7, 11-14 




Our results 


3-6, 9, 10, 13 


11-14 


Scottish 


Classical Method 


7, 11, 33, 35 


7, 18 




Hadi's Method 


7, 11 


7, 18 




Our Results 


11, 17, 35 


7, 18, 33 


Twovariables 


Classical Method 


51-53 


54-56 




Hadi's Method 


51-53 


54-56 




Our results 


52 


54-56 


Threevariables 


Classical Method 


18, 51-53 


54-56 




Hadi's Method 


51-53 


54-56 




Our Results 


51-53 


9, 37, 54-56 



Table 1. Detection of outliers and leverage points according to different methods. 

As we can see in Table 1, our proposed method performed very well in de- 
tecting all outliers in the data set 'Telephone'. The other methods failed to 
identify all of them because the observation 19 and 20 mask all the others. 

In the case of 'Hawkins' data, our proposed method as well as the other two 
methods failed to identify the outliers. The outliers are all also swamped in 
the good cases 11-14. On the other hand, our method detected almost all 
leverage points. 

In the data set 'Scottish', Table 1 shows that all three methods identified 
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correctly the observation 7 and 18 as outliers. These observations mask the 
observation 33 detected by our proposed method. The observation 11 is suit- 
ably detected as leverage point by all three methods, but there are others 
observations identified as leverage points only in one or two methods. 

For the simulated data sets, all methods performed very well in detecting all 
outliers. However, our proposed method failed to identify all leverage points. 



5 Conclusion 



The computation of the scores requires the determination of a certain num- 
ber of LAD regression models, and this is computationally longer than usual 
methods. However it is important to note that the principle is very simple, 
and takes into account natural requirements for robustness in the detection of 
influential observations. Nowadays, the performances of common notebooks 
are largely sufficient to perform in a few seconds the computations for the 
above examples, so the new tools are suitable for applications in the statisti- 
cal methodology. 

A code has been implemented in Splus and is available at the web site of 
the second author http://www.unine.ch/statistics/melfi/lad.html. A variety 
of data sets, including simulated datasets used in Section 4, is also available 
on the same web site. 
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